Improving Credibility of Machine Learner Models in Software Engineering
نویسنده
چکیده
Given a choice, software project managers frequently prefer traditional methods of making decisions rather than relying on empirical software engineering (empirical/machine learning-based models). One reason for this choice is the perceived lack of credibility associated with these models. To promote better empirical software engineering, a series of experiments are conducted on various NASA datasets to demonstrate the importance of assessing the ease/difficulty of a modeling situation. Each dataset is divided into three groups, a training set, and “nice/nasty” neighbor test sets. Using a nearest neighbor approach, “nice neighbors” align closest to same class training instances. “Nasty neighbors” align to the opposite class training instances. The “nice”, “nasty” experiments average 94% and 20% accuracy, respectively. Another set of experiments show how a ten-fold cross-validation is not sufficient in characterizing a dataset. Finally, a set of metric equations is proposed for improving the credibility assessment of empirical/machine learning models.
منابع مشابه
CREDIBILITY-BASED FUZZY PROGRAMMING MODELS TO SOLVE THE BUDGET-CONSTRAINED FLEXIBLE FLOW LINE PROBLEM
This paper addresses a new version of the exible ow line prob- lem, i.e., the budget constrained one, in order to determine the required num- ber of processors at each station along with the selection of the most eco- nomical process routes for products. Since a number of parameters, such as due dates, the amount of available budgets and the cost of opting particular routes, are imprecise (fuzz...
متن کاملWhen Will It Be Done? Machine Learner Answers to the 300-Billion-Dollar Question
W hen will it be done? " Senior managers will ask their software project managers this question more than 250,000 times this year. Corporations, which collectively commit over US$300 billion annually toward new software project initiatives, 1 will want to know the answer. However, when you consider Barry Boehm's claim that early software life-cycle estimates vary by a factor of four (25 to 400 ...
متن کاملSoftware Quality Modeling with Limited Apriori Defect Data
In machine learning the problem of limited data for supervised learning is a challenging problem with practical applications. We address a similar problem in the context of software quality modeling. Knowledge-based software engineering includes the use of quantitative software quality estimation models. Such models are trained using apriori software quality knowledge in the form of software me...
متن کاملSoftware Engineering and Simulation Credibility
Most people think of “validation” as the hallmark of simulation credibility. But some simulations, by their very nature (e.g., mission level models, highly complex physics-based simulations, etc.) are notoriously difficult to validate. There are also situations in which the process of validation, even if feasible, cannot keep pace with the dynamic nature of simulation evolution, or where the co...
متن کاملImproving the Inference of Gene Expression Regulatory Networks with Data Aggregation Approach
Introduction: The major issue for the future of bioinformatics is the design of tools to determine the functions and all products of single-cell genes. This requires the integration of different biological disciplines as well as sophisticated mathematical and statistical tools. This study revealed that data mining techniques can be used to develop models for diagnosing high-risk or low-risk lif...
متن کامل